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Abstract 

Air pollution is a great concern because of its impact on human 
health and on the environment. Statistical models play an impor- 
tant role in improving knowledge of this complex spatio-temporal phe- 
nomenon and in supporting public agencies and policy makers. We 
focus on the class of hierarchical models that provides a flexible frame- 
work for incorporating spatio-temporal interactions at different hierar- 
chical levels. The challenge is to choose a model that is satisfactory in 
terms of goodness of fit, interpretability, parsimoniousness, prediction 
capability and computational costs. In order to support this choice, 
we propose a comparison approach based on a set of criteria summa- 
rized in a table that can be easily communicated to non-statisticians. 
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Our proposal - simple in principle but articulated in practice - holds 
true for many environmental phenomena where a hierarchical struc- 
ture is suitable, a large-scale trend is included and a spatio-temporal 
covariance function has to be chosen. 

We illustrate the details of our proposal through a case study con- 
cerning particulate matter concentrations in Piemonte region (Italy) 
during the cold season October 2005-March 2006. From the evalua- 
tion of the proposed criteria for our case study we draw some conclu- 
sions. First, a model with a complex hierarchical structure is globally 
preferable to one with a complex spatio-temporal covariance func- 
tion. Moreover, in the absence of suitable computational resources, a 
model simple in structure and with a simple covariance function can 
be chosen, since it shows good prediction performance at reasonable 
computational costs. 

Keywords: Particulate Matter PMio, hierarchical models, spatial 
mapping, spatio-temporal covariance function, prediction performance 
indexes. 

1 Introduction 

Air quality is jeopardized by the presence of several pollutants. Particulate 
matter (PM) is one of the most critical air pollutants in Europe and, despite 
the improvements thanks to European Union legislation, it still has a heavy 
toll on human life (Harrison et ai, 2008). From a statistical perspective, 
many articles have been proposed for modelling the concentration of PM 
(and of other pollutants) and understanding its underlying complex spatio- 
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temporal dynamics. In particular, almost all the works propose spatial pre- 
diction techniques in order to obtain concentration maps useful for evaluating 
the health risk and assessing compliance with European and national direc- 
tives, even in places where no measurement stations are located (e.g. Fasso 
et ai, 2007; Pollice and Jona Lasinio, 2010b; Sahu and Nicolis, 2007; Sahu 
et ai, 2006; Zidek et ai, 2002). Moreover, some works also develop methods 
for temporal forecasting (such as, for example, Sahu and Mardia, 2005 and 
Smith and Kolenikov, 2003) or consider more critical pollutants at the same 
time (e.g. Al-Awadhi and Al-Awadhi, 2006; Pollice and Jona Lasinio, 2010a; 
Shaddick and Wakefield, 2002). When explanatory variables are available, 
measured by a monitoring network or simulated by a deterministic model, 
they contribute to the mean structure of the model, also known as large-scale 
component in geostatistical literature (see, for example, Cressie, 1993). To 
this regard sensitivity analysis techniques can be used to understand the role 
of covariates with respect to the output uncertainty, as shown in Cocchi et al. 
(2007) and Fasso and Cameletti (2010). 

The common characteristic of many models found in literature is their 
hierarchical structure. This means that they are constructed by putting 
together conditional sub-models defined at each hierarchical stage. With ref- 
erence to likelihood, this corresponds to taking a conditional point of view for 
which the joint probability distribution of a spatio-temporal process can be 
expressed as the product of certain simpler conditional distributions. This 
property makes it possible to deal with the complexity of spatio-temporal 
processes in a straightforward way which is the reason why hierarchical mod- 
els have become so popular for modelling environmental processes, especially 

3 



from a Bayesian perspective (Clark, 2005; Wikle, 2003; Wikle et al, 1998). 

With particular reference to air pollution, covariates, such as meteorolog- 
ical and orographical variables, play an important role in seizing the large- 
scale variation of data, the influence of meteorology and geographical factors 
on pollutant concentration being well-known. Residual variability, also called 
small-scale variation, is modelled by a spatio-temporal process defined at a 
particular level of the hierarchy and by a spatio-temporal covariance function. 
In this context, an interesting question arises concerning the best hierarchical 
structure that can be combined with the spatio-temporal covariance function 
in order to seize the spatial and temporal dynamics of the considered pro- 
cess. For example, is it preferable to have a two-level model with a complex 
nonseparable covariance function or a model with a more complicated hier- 
archical structure but a simpler covariance function? Finding an answer to 
this question might be useful in order to provide environmental agencies with 
an effective statistical model for building reliable PM concentration maps, 
equipped with the corresponding uncertainty measure. 

The goal of this paper is to provide an instrument to answer to this ques- 
tion. Therefore, we propose certain criteria that objective as possible 
for comparing spatio-temporal models for air quality data. Such criteria 
consider both goodness of fit and model complexity as well as computational 
costs. Actually, a model comparison for space-time models is proposed by 
Huang et al. (2007) through AIC and BIG criteria, which are investigated in 
their practical behaviour since asymptotic theory of these indexes for geosta- 
tistical models is substantially missing in literature. Moreover, these authors 
use mean squared prediction error at a fixed time to compare prediction ca- 
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pability. Specifically for fine particulate matter, Pang et al. (2010) compare, 
on practical grounds, ordinary kriging with Bayesian maximum entropy tech- 
nique, as implemented in SEKS-GUI software through averaged estimation 
errors and error variances at four validation sites (Kolovos et al., 2006). 

Hence, in the absence of a space-time model selection theory, we first 
discuss the hierarchical models considered and then propose a set of em- 
pirical criteria to compare their intrinsic and computational complexity and 
their spatial prediction capability (through suitable indexes for air quality 
models opportunely summarized in a qualitative index). To implement our 
proposal, we deal with a case study concerning particulate matter in the 
Piemonte region (Italy). Here, we compare models on air quality real data; 
nonetheless, the proposed comparison approach holds true for general envi- 
ronmental phenomena where a hierarchical structure is suitable, a large-scale 
trend is included and a spatio-temporal covariance function has to be chosen. 

The paper is organized as follows. In Section 2 we introduce six models 
to be compared, specifying their hierarchical structure and spatio-temporal 
covariance function. The models, which constitute an extensive class of 
spatio-temporal hierarchical models, are fully discussed in Section 3. Sec- 
tion 4 describes the criteria used to establish which model best describes the 
data. In particular, we consider the intrinsic and computational complexity 
as well as the spatial prediction capability of each model. The paper also 
features an application proposed in Section 5 regarding PM data measured in 
Piemonte during the 2005-2006 winter season. Unlike Huang et al. (2007) we 
implement all the models in a fully Bayesian framework via Markov Chains 
Monte Carlo (MCMC) methods. The paper ends with a discussion of the 
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results and the conclusions. The Appendices contain details about the full 
conditional and predictive distributions involved in the model estimation and 
spatial prediction procedures which may help. 



2 Hierarchical spatio-temporal models 

Let z{si,t) be the scalar spatio-temporal process observed at site Si and at 
time t where i = 1, . . . ,d and t = 1, . . . ,T. 

We assume the following measurement equation as the first level of the hier- 
archical models: 



where e{si,t) ~ (0, o"^) is the measurement error defined by a Gaussian 
white-noise process, serially and spatially uncorrelated. The term u{si,t) 
is the so-called state process and can be defined, in turn, by other sub- 
levels giving rise to different hierarchical models described in the following 
subsections. 



Model A is a two-level regression model characterized by a large-scale term 
and a spatio-temporal process for the residual small-scale component. In 
particular, quantity u (sj, t) of Eq.(l) is given by the sum of a trend /i (sj, t) 
and a random process u (s,, t), as follows: 




(1) 



2.1 Model A 




(2) 
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Trend {si, t) is a function of k covariates and is given by 



fi{si,t)=X{s„t)f3 (3) 

where X = (Xi ,Xk denotes the covariate vector for 

site Si at time t and /3 = . . . , is the coefficient vector. 

The zero-mean Gaussian process u (sj, t) of Eq.(2) is the residual process 
whose spatio-temporal covariance function depends on the parameter vector 
6, namely 

Cov {io {s,, t) , u (s„ t')) = crlCe {h, I) \/t ^ j,t ^ t' (4) 

where o"^ is the constant in time and space variance of the process and 
C0(., .) is the spatio-temporal correlation function parameterized by 9, with 
h = \\si — SjW the Euclidean distance between sites i and j, I = \t — 1'\ the 
temporal lag between t and t'. As the covariance function (4) only depends 
upon h and /, the process is supposed to be second-order stationary and spa- 
tially isotropic. We consider three different forms for the covariance function, 
which give rise to the following models. 

Model Al Under the hypothesis that the process u (sj, t) is i.i.d. over time, 
it holds that 

I if t^t' 

Cov {u{si,t) ,u{sj,t')) = < (5) 

[alCeih) z/ t = t' 

where Ce{h) is a purely spatial correlation function. To this regard 
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many functions defining isotropic second-order stationary spatial pro- 
cesses can be found in Banerjee et al. (2004, Ch.l). 

Model A2 Adopting a separable approach, the space-time covariance func- 
tion factors into a purely spatial and a purely temporal component and 
is given by 



with 9 = {9i, 62). In this case there is no interaction between space and 
time. 

Model A3 Adopting a nonseparable approach, a space-time correlation func- 
tion can be defined in a general form, as introduced in Gneiting (2002): 



The function ?/;(x),x > 0, is any completely monotone function while 
^{x),x > 0, is any positive function with a completely monotone 
derivative. In particular, we consider the two cases described in Ta- 
ble 1, corresponding to Model A3-1 and Model A3-2 respectively, 
where < b < 1, u > 0, the smoothness parameters a and 7 take val- 
ues in (0, 1] and the space-time interaction parameter r of Model A3-2 
is defined in [0, 1]. Finally, the scaling parameters of time and space a 
and c are positive. It follows that ^^as-i = (a, b, 7, a, c) for Model A3-1 
and 6*^3-2 = («, t, 1, ^, c) for Model A3-2. 



Ce{h,l) = Ce, {l)Ce, (h) 



(6) 
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2.2 Model B 

Differently from Model A, Model B has a three-level hierarchical structure 
due to an additional equation required for modeling the temporal dynamics 
of a constant in space latent process. This means that for Model B the term 
u (sj, t) of Eq.(l) is given by the sum of a trend /i (sj, t), of a unidimensional 
latent process y (t) and of a purely spatial process u (sj), as follows: 

u{si,t) = fi{si,t) + y{t) +uj{si) , (8) 

where y{t) and u (sj) are uncorrelated. In Eq.(8) the trend /i (sj, t) is defined 
as in Eq.(3), while the term y [t) refers to a constant in space unidimensional 
process with a temporal dynamics, with an autoregressive coefficient p, given 

by 

y{t)=py{t-l)+r]{t) (9) 

where y (0) ~ (0, a^) and 77 (t) ~ (O, a^) are uncorrelated. Finally the 
spatial process u (sj) is assumed to follow a Gaussian distribution A^ (0, a^) 
whose spatio-temporal covariance function is Cov {u (sj) , u (sj)) = cr'^Cg (h). 

2.3 Model C 

Model C is a three-level hierarchical model defined by a spatio-temporal 
residual process with temporal dynamics. Specifically the term u{si,t) of 
Eq.(l) is given by a trend p{si,t) and a spatio-temporal process y{si,t), 
namely 

u{si,t) = p{si,t) + y {si,t) . (10) 
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The trend fi{si,t) has the same structure as in Model A and B (see Eq.(3)) 
while y is a spatio-temporal process that changes in time according to 
the following dynamics with autoregressive coefficient p: 

y{si,t)=py{si,t-l)+u{si,t) (11) 

where y (sj, 0) ~ (0, cr^). Finally, u (sj, t) ~ (0, cr^) is a spatio-temporal 
process i.i.d. over time, so that its spatio-temporal covariance function is 
given by Eq.(5). Obviously y and uj are uncorrected. 

3 Model discussion 

All the models are characterized by the same measurement equation given by 
(1) and by the trend term in Eq.(3) defined as a function of some covariates 
that can change in space and time (e.g. meteorological variables) or can 
be constant in time (e.g. spatial coordinates). The six models we consider 
differ in the way the residual detrended process is modelled and in how the 
spatio-temporal correlation is treated. 

Model A, described in Section 2.1, features a very simple structure charac- 
terized by a unique residual spatio-temporal process for which three different 
covariance functions are considered. The most complex case is represented 
by Model A3, which is characterized by a nonseparable spatio-temporal co- 
variance function. Considering that space and time interact together, from 
a physical point of view it is reasonable to adopt a nonseparable approach 
even if it has certain computing drawbacks. In fact, the size of the variance- 
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covariance matrix of the uj {si,t) process is {dT x dT) and, especially in the 
case of a dense network or a long monitoring period, matrix operations be- 
come infeasible from a computational point of view. Model Al and Model A2 
introduce some simplifications: Model Al supposes that the process u 
is i.i.d. over time, which leads to a purely spatial covariance function (see 
Eq.(5)), while Model A2 is based on the separability hypothesis (see Eq.(6)). 
The choice of a spatio-temporal covariance function to be used depends on 
certain considerations. First of all, as stated in Cressie and Huang (1999), 
separable models are often chosen for convenience rather than for their ability 
to properly fit the data; basically, the same holds true for the i.i.d. over time 
hypothesis case. This is mainly related to the computational advantages in 
implementing and estimating a model with these simplified functions, which 
usually depend on a small number of parameters and involve smaller matri- 
ces. Generally speaking, a spatial covariance function that does not depend 
on time - the i.i.d. case - can be used when it is possible to show, for example 
by means of daily variograms, that the spatial correlation does not change 
significantly in time. If this is not the separability test (e.g. Fuentes, 

2006) should be performed in order to verify if the separability hypothesis 
can be assumed. Otherwise, a nonseparable covariance function should be 
used. 

Model B and C differ from Model A for their three-level structure: in both 
cases, in fact, an equation is introduced for modelling the temporal dynamics 
of a latent process. In particular, for Model B this is given by a purely 
temporal AR(1) process (defined in (9)) which is supposed to be constant in 
space, while for Model C it has an AR(1) structure with innovations i.i.d. 
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over time (see Eq.(ll)). It is important to point out that the size of the 
latent process is always unidimensional for Model B while for Model C it is 
defined by the number of spatial sites d. From a computational perspective 
this means that estimation procedure costs are higher for Model C than for 
Model B, even if Model B has an extra parameter to estimate, i.e. a^. 

It is interesting to point out that both the spatio-temporal covariance 
functions of Model B and Model C can be rewritten in a separable form, 
additive and multiplicative respectively (for further details see Appendix A). 

4 Model comparison 

We are interested in determining which is the most effective model for fitting 
the data. To achieve this goal, we compare the six models using a set of em- 
pirical criteria which explore the model complexity and prediction capability, 
as described hereafter. 

4.1 Intrinsic and computational model complexity 

The intrinsic complexity of a model can be roughly defined as the number of 
parameters to be estimated. Generally, as the number of parameters increases 
the estimation procedure becomes more complex since, in the fully Bayesian 
framework we adopt, steps are added to the Gibbs sampling algorithm. More- 
over, if the parameter estimation requires the use of the Metropolis-Hastings 
(MH) algorithm, since no closed-form full conditional posterior distributions 
are available, the algorithm can become unstable and requires a larger num- 
ber of iterations in order to reach convergence. This underscores out how a 
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richer model, in terms of parameters and hierarchical structure, is necessarily 
more complex from a computational point of view. To this regard, we also 
consider the size of the biggest matrix to be inverted for each model. Con- 
sidering that matrix inversion is of order in computation (where n is the 
total number of data), a considerable size can give rise to massive computa- 
tional loads infeasible to be carried out. To this regard, just to have an idea, 
consider that Matlab^ on an Intel Core 2 Duo Mac (2.4 Ghz, 4GB RAM) 
takes about 15 and 76 seconds to invert - using the Cholesky factorization - a 
full symmetric matrix with size (4500 x 4500) and (8000 x 8000), respectively. 
The same operations require about 5 and 21 seconds on an Intel Xeon 8 CPU 
cluster (2.66 Ghz, 8 GB RAM). In the hypothetical case of one parameter 
and 100000 iterations required for the convergence of the algorithm, at least 
5 or 24 days would be necessary for the implementation of the 4500 or 8000- 
dimensional cases, respectively. Obviously, these computing times increase 
(in a non linear way) as the parameter set becomes larger. 

This information about intrinsic and computational model complexity is 
shown in the first rows of Table 6 (pag. 53)^. It is clear that the nonsepa- 
rable models. Model A3-1 and Model A3-2, are the most complex ones since 
they have the biggest parameter vectors (respectively 7 and 8 parameters, 
excluding the /3's), all estimated using the MH algorithm. Moreover, the size 
of the variance-covariance matrix is [dT x dT). This means that, from a 
computational point of view, nonseparable models are extremely expensive 
and their implementation is expected to be severely time consuming. 
^We use Matlab R2009b with the Parallel Computing Toolbox. 

^We suppose that for Model Al, B and C the parameter vector 6 is unidimensional and 
that for Model A2 the parameter vectors 6i and 62 are unidimensional. 
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Model A2 has 4 parameters to be estimated and, thanks to the separa- 
bihty of the spatio-temporal covariance function, it enjoys the properties of 
the Kronecker product. This results in certain computational advantages re- 
garding the inverse and the determinant of the {dT x dT) variance-covariance 
matrix, because we deal with {d x d) and (T x T) matrices separately (for 
details see Section B.2). In Table 6 the size of the biggest matrix to be in- 
verted is (T X T) since we consider a small monitoring network, such that 

In terms of complexity Model Al, Model B and Model C are more suitable 
because they have small parameter vector, make use of the MH algorithm in a 
limited way and are characterized by {d x (i)-dimensional variance-covariance 
matrix. This means much more computationally manageable models. 

With respect to the computational complexity, the models are compared 
also considering the computing time required to estimate the parameters and 
for performing the spatial predictions over the validation stations. The times 
are evaluated per iteration of the MCMC run and are computed using the 
above-defined Intel Xeon 8 CPU cluster. Generally, it is clear that, keeping 
all the rest equal, a model that can be quickly implemented is more desirable. 

4.2 Spatial prediction capability 

As the aim of the modelling is prediction, we compare models on the basis 
of their spatial prediction capability which is evaluated using certain perfor- 
mance indexes computed on validation stations. In particular, we consider 
five indicators based on the differences between predicted and observed data 
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together; moreover, we compute the observed coverage probabihty. More 
precisely, together with the usual root mean square error (RMSE) and the 
correlation coefficient p, we adopt the Normalised Mean Bias Factor (NMFB) 
recently introduced by Yu et al. (2006) and two indexes, named WNNR and 
NNR, proposed by Poll and Cirillo (1993) and defined on the Normalized 
Ratios between the predicted and the observed values. 

For a fixed site s,, let Zt be the observed time series and zt the predicted 
time series with t = 1, . . . ,T (see Appendix C for details about prediction); 
moreover, denote with z and z the corresponding mean values. The Nor- 
malised Mean Bias Factor is defined by 



NMBF 



^ - 1 ifz> z 

1-^ \il<z 



NMBF is defined on M and has the advantage of both avoiding infiation and 
asymmetry, two problems discussed in Yu et al. (2006). 

The Weighted Normalised mean square error of the Normalised Ratios is 
defined by 

WNNR 



while the non-weighted one is 



NNR 



Et(l - kt] 



where St = Zt/z is the weight and kt = exp — \\n{zt/z-t)\ is the normalised 
ratio. WNNR and NNR are both positive and have the advantage of taking 
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properly into account the peaks of observed data (see the discussion in Poh 
and Cirillo, 1993). 

These 5 indexes and the observed coverage probabihty are computed for 
all the validation stations and for each model. Successively, these 6 perfor- 
mance measures are summarized over stations in a qualitative index based 
on "stars category", so that more stars correspond to a better prediction 
capability of the model (see Table 6 on page 53). 

5 Model comparison for PMio in Piemonte 

In order to compare the models described in Section 2, below we consider 
particulate matter concentration with an aerodynamic diameter of less than 
10 fim (PMio, in fig/m^) measured in the Piemonte region (Italy) during 
the October 2005 - March 2006 winter season. Piemonte is situated in the 
western part of the Po river basin, surrounded on three sides by the Alps. The 
pollutant dispersion is strongly influenced concurrently by the shelter effect 
of the Alps and by meteorological features which depend on the complex 
orography of the region. So, for example, we can have weak winds and 
stagnation in the central part of the region or breezes and foehn winds in 
mountains and valleys. For these reasons, we usually observe lower PMio 
concentration near the Alps, whereas higher pollution levels are detected in 
plains closer to urban areas. Actually, as expected, air quality is worse in 
urbanized areas where the most important emission sources, in other words 
industrial sites and high traffic levels, are located. 
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5.1 Data description 

We analyze daily PMio data measured by a network of c? = 24 sites (see red 
triangles in Figure 1 and the corresponding labels in Figure 2) for T = 182 
days (data are provided by an information system called AriaWeb Regione 
Piemonte) . Moreover, we set aside data from 10 sites for validation purposes 
(see blue dots in Figure 1). The 24 sites are selected so that the amount of 
missing data does not exceed 20% and the missing data is not sequential. 
This guarantees good spatial coverage of the monitoring network so that 
stations can also be found in rural plain areas as well as in urbanized alpine 
valleys. 




Figure 1: Locations of the 24 PMio monitoring sites (red triangles) and 10 
validation stations (blue dots). The complete names of the 24 stations are 
given in the x-axis labels of Fig. 2. The labels of the 10 validation sites are: 
25 Biella - Largo Lamarmora, 26 Borgo San Dalmazzo, 27 Bra, 28 Chivasso, 
29 Ivrea, 30 Saliceto, 31 Serravalle Scrivia, 32 Susa, 33 Torino - P.zza Rivoli, 
34 Torino - Via Gaidano. 



Figure 2 shows the distribution of the PMio concentration by stations. 
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We can see that most of the stations have average and median concentration 
levels above 50 fig/m^, which is the threshold set by the European Com- 
mission (2008/50/EC directive) that can be exceeded no more than 30 days 
a year. All the distributions are positive skewed due to the occurrence of 
extremely polluted days. This kind of situation is quite common in the Po 
Valley, especially during the winter season when relatively stable atmospheric 
conditions, associated with a reduced washout of particulate matter, give rise 
to higher concentration. 

□ <50% o ^ 

□ 50%-80% o 




Figure 2: PMio concentration distribution over stations (the bold horizontal 
line corresponds to the median while the bold dot to the mean). The colours 
of the boxplots are given by the area type, i.e. the percentage of built-up 
surface (see the legend). 

According to European legislation, each site is classified by area-type 
which has three categories (rural, suburban and urban) depending on the 
level of urbanisation of the area. More precisely, in Piemonte a site is classi- 
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fied rural if the percentage of built-up surface within a 1 km radius around 
it is less than 50%, suburban if that percentage is between 50% and 80%, 
urban if otherwise. Looking at Figure 2, area-type does not seem to have 
a visible effect on PMio concentration. In fact, we observe rural stations 
with high levels of pollution as well as urban locations with lower concentra- 
tion. This can seem unusual since area-type can be considered as a proxy of 
anthropogenic activities, but it could happen since PMio pollution is a com- 
plex phenomenon strongly related to meteorology and orography, especially 
during the winter season. 

Indeed, our models share a common trend component where some mete- 
orological and orographical variables appear. The first ones are time-varying 
covariates obtained from a nested system of deterministic computer-based 
models implemented by the environmental agency ARPA Piemonte (Bande 
et ai, 2007; Finardi et ai, 2008). Such models provide the estimates on 
a regular 4 km x 4 km grid of some meteorological variables, turbulence 
and chemicals parameters. By means of a preliminary regression analysis 
using Akaike's information criteria (AIC) and parameter significance, we 
choose the following covariates: daily maximum mixing height (HMIX, in 
m), daily total precipitation (PREC, in mm), daily mean wind speed (WS, 
m/s), daily mean temperature (TEMP, in K) and daily emission rates of 
primary aerosols (EMI, in g/s). The mixing height is one of the fundamen- 
tal parameters that characterize the structure of the atmosphere near the 
ground. Low mixing heights mean that the air is generally stagnant with 
very little vertical motion, and therefore pollutants are usually trapped near 
the ground surface. High mixing heights allow vertical mixing within a deep 
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layer of the atmosphere and good dispersion of pollutants. Thus a negative 
relationship is expected between PMio and HMIX. Precipitation plays a cru- 
cial role in explaining PMio variations taking into account the wet removal 
of aerosol suspended in the atmosphere. Moreover, the daily mean values 
of wind speed are used in order to take into account the pollutant removal 
due to strong wind episodes, often combined with foehn conditions, that 
frequently occur in Piemonte during the winter time. Mean temperature in- 
fluences the dispersion and accumulation of pollutants. On the one hand, it 
is related to atmospheric photochemical reactions and, consequently, to the 
production of secondary aerosols. On the other, low temperatures near the 
ground are often related to strong thermal inversion, one of the atmospheric 
features responsible for heavy pollutant events in urban area. Moreover, it is 
well-known that low temperatures cause an increase in particulate emissions 
from vehicle traffic sources. Finally, emissions take into account informa- 
tion about anthropogenic activities (e.g. energy production, domestic and 
industry production, road transport) which are the main sources of primary 
pollutants and the precursors of secondary pollutants. Together with these 
time-varying covariates, we also consider altitude (A, in m) and coordinates 
(UTMX and UTMY, in km) in order to take into account the orography of 
the region. 

In order to stabilize the variances, which increase with the mean values, 
and make the distribution of PMio data approximately normal, we adopt the 
logarithm transformation of PMio data. Subsequently, in order to investigate 
spatial and temporal correlation, we fit a simple regression model over all the 
4368 (=24 X 182) PMio data with the covariates - standardized - described 
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Figure 3: Spatial residual correlation cloud and lowess curve. 

above. Figure 3 illustrates the residual correlation cloud, that is the set of 
spatial correlations between sites at different distances. It shows that, even 
after removing the so-called large-scale component given by the covariates, 
a spatial correlation still remains: the lowess curve decreases slowly with 
distance and settles around 0.6 at 100 km. Figure 4 shows the boxplots of 
empirical autocorrelations calculated on the residuals - over stations - and it 
stands out that a temporal structure also remains, with a temporal correla- 
tion of about 0.6 at the first time lag. These results show great evidence of 
the importance of using a spatio-temporal model for catching the complex 
structure and dynamics of the phenomenon. 

5.2 Model implementation details 

The models presented in Section 2 are now fitted on the PMio concentration 
data described in the previous section. 
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Figure 4: Boxplots of the residual empirical autocorrelations computed over 
the 24 stations. 

The covariate vector X (sj, t) of Eq.(3), which also includes a constant for 
the intercept term, is given by 

X{s„t) = (^l,A{s,),UTMX{s,),UTMY{s,),WS{s„t), 

HMIX {si, t) , TEMP (si, t) , PREC {si,t) , EMI {s^ t) ^ 

where i = 1, . . . , 24 and t = 1, . . . , 182. 

With reference to the covariance functions, we adopt a purely spatial 
exponential form given by Cg {h) = exp {—Oh) for Model Al, B and C and a 
double exponential structure for Model A2, Cg {h, I) = exp {—Oil) exp (—02^)- 
For Model A3-1 and A3-2 we refer to the nonseparable functions defined in 
Table 1. 

As regards inference, i.e. parameter estimation and spatial prediction, we 
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adopt a fully Bayesian approach via Monte Carlo Markov Chain (MCMC) 
methods. In particular, we use the Metropolis-within-Gibbs algorithm im- 
plemented in Matlab by ad-hoc written code. The complete details of the 
full conditional and predictive distributions are given in Appendices B and C. 
Convergence is diagnosed by monitoring the mixing of the chains by means 
of traceplots together with autocorrelation and kernel density plots. 

With reference to the prior specification, we assume Normal independent 
priors A^(0, 100) for each component of the f3 vector and Inverse Gamma 
distributions /G(2, 1) for the variance parameters (o"^,ct^ and cr^). A Uni- 
form prior distribution f/(0, 1) is chosen for the parameter 6 of Model Al, 
B and C, and the parameter 62 of Model A2: this corresponds to a spatial 
correlation between 0.37 and 1 at 1 km of distance and between and 1 
at the maximum distance of 190 km. For the temporal decay parameter 61 
of Model A2 we adopt a Uniform prior distribution f/(0.3,3) whose support 
corresponds to a temporal range between 1 and 10 days (the range is deter- 
mined using the relationship exp{—6il) ^ 0.05). For the parameters of the 
nonseparable correlation function of Model A3-1 and A3-2, we suppose vague 
prior distributions, that is U{0, 1) for a, b, 7, r and U (0, 10) for a, c, 7. 

Note that for Model A2 it is not possible to marginalize the posterior 
distribution over the latent process u{si,t) (see Appendix B for details); 
thus, for sampling it we employ the en bloc procedure described in Sahu 
et al. (2006). Finally, for Model B and C the latent process y is sampled from 
its full conditional distribution by the Forward Filtering Backward Sampling 
(FFBS) algorithm as described in West and Harrison (1997) . 
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5.3 Results and discussion 

Table 2 displays the posterior estimates for the covariate coefficient vector 
p. Note that they are robust with respect to the model specification, mean- 
ing that the choice of a spatio-temporal covariance function as well as the 
introduction of a temporal dynamics equation do not affect strongly the es- 
timates. In particular, the posterior mean of the intercept is around 3.9 on 
the log scale, which corresponds to an average pollution level of about 49.4 
fig/m^. As expected, a significant and positive relationship can be seen be- 
tween emissions (EMI) and PMio concentration. Moreover, the significance 
of the coefficients of WS, HMIX, TEMP and PREC confirms the importance 
of meteorological variables on air quality. The UTMX and UTMY coefficient 
estimates are both negative, indicating a decreasing spatial trend from West 
to East and from South to North. Finally, altitude (A) has a significant effect 
in reducing PMio concentration. 

With regards to the variance estimates in Table 3, we observe that more 
variation is explained by the spatial or spatio-temporal term rather than by 
the measurement error. For Model Al, A3-1, A3-2 and B, the estimate of cr^ 
is larger - between 1 and 3 times - than the estimate of while for Model A2 
and C it is, respectively, 16 and 74 times larger. The fact that for Model C the 
measurement error variance is so small with respect to the spatio-temporal 
error variance can be ascribed to its complex structure, clearly useful for 
explaining variability of the spatio-temporal process. 

Table 4 reports the posterior estimates of the correlation function coeffi- 
cients. For Model Al, the estimate of 9 is 0.0033 giving rise to a strong spatial 
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correlation which decreases slowly with distance. In fact, at a maximum dis- 
tance of 190 km the spatial correlation is about 0.54. For Model A2, we find 
6i = 0.492 and 62 = 0.032 that correspond to a temporal and spatial range 
of, respectively, 6 days and 95 km (the spatial range is determined using the 
relationship exp(— ^2^) ~ 0.05). Instead, for Model B we estimate a range 
of about 60 km, indicating that the spatial correlation decreases rapidly and 
that at 190 km there is no correlation. On the other hand, for Model B the 
estimate of the AR(1) temporal correlation coefficient p is 0.8313, confirming 
the short-term temporal persistence of particulate matter. Model C seems 
to have a spatial correlation similar to the one of Model Al (the posterior 
mean of 6 is 0.0022 with a correlation of about 0.66 at 190 km), while its 
temporal correlation is weaker (estimate of p equal to 0.6535) than the one 
of Model B. 

Posterior means and credible intervals of the nonseparable correlation 
function parameters (Model A3-1 and A3-2) are shown in Table 5. Using 
these estimates, the correlation functions defined in Eq.(7) and Table 1 are 
plotted in Figure 5 (with 0-10 days of temporal lag and 0-100 km of spatial 
distance). The two surfaces resemble each other, in other words the spatio- 
temporal correlation decreases similarly with respect to both the temporal 
lag and the spatial distance. 

As discussed in Section 3, the first rows of Table 6 provide information 
about intrinsic and computational model complexity. Moreover, the table 
contains estimation and prediction times (in seconds per iteration). As ex- 
pected, the estimation computing time increases with the number of param- 
eters and the size of the biggest matrix to be inverted: in particular. Model 
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Figure 5: Nonseparable spatio-temporal correlation function for Model A3-1 
(left) and Model A3-2 (right) with the estimated parameters in Table 5. 

A3-1 and A3-2 are the most time consuming (about 45 and 47 seconds, re- 
spectively) while the simple Model Al is the fastest to be estimated (0.014 
seconds per iteration). Focusing on Model B and C, both characterized by an 
additional equation for the temporal dynamics requiring the use of the FFBS 
algorithm (see (9) and (11)), we observe that Model B is less time-consuming 
than Model C, even if it has one additional parameter to estimate, a^, and 
adopts more extensively the MH algorithm. This can be explained by con- 
sidering that, while for Model B the FFBS algorithm involves 1-dimensional 
terms, for Model C it deals with (i-dimensional terms. Model A2 requires 
more time (about 4 seconds) than Model Al, B and C because it uses en 
bloc samphng for u{si,t), that is an iterative computationally expensive pro- 
cedure. 

Regarding prediction time. Model Al and B have very similar perfor- 
mances since the prediction procedure is almost the same (see Appendix C). 
Model C is slightly slower because it has a more complicated temporal dy- 
namics requiring the introduction of an additional step for the composition 
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sampling procedure given by Eq.(24). The prediction requires more time 
for Model A3-1 and A3-2 given that the routine involves variance-covariance 
matrices with size {dT x dT). The 27 seconds per iteration necessary for 
Model A2 could be a cause for worry, but they are justified by the fact that 
drawing from the Normal distribution, as done in Eq.(22), slows down the 
composition sampling procedure. 

The last row of Table 6 summarizes in "stars category" (from 1 to 3) the 
prediction capability of our models. The stars are based on the values of the 
performance measures defined in Section 4.2 and computed for the 10 valida- 
tion sites. Figure 6 shows through boxplots, for each model, the distribution 
of the indexes and of the observed coverage probability over station. The 
boxplots of Model A2 show the greatest variability and the largest values 
of RMSE, WNNR and NNR, meaning that the prediction is poorer where 
the observed time series have peaks. Thus, overall Model A2 has the worst 
prediction performance and it is classified with one star. Conversely, Model 
C boxplots show less variability, especially looking at NMFB, RMSE and the 
observed coverage probability. The other models seem to have very similar 
prediction performances, except for the observed coverage probability distri- 
butions of Model A3-1 and A3-2 that are left skewed (their first quartiles are 
around 0.87). For this reason we assign three stars to Model C and two stars 
to the remaining models. 

The entire Table 6 provides the set of criteria useful for choosing the 
most effective model in our case study. First of all, we discard Model A2 
because of its poor prediction performance and long computing time for 
spatial prediction. Also, we can not suggest implementing Model A3-1 and 
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NMFB RMSE WNNR 




NNR Pearson Observed coverage prob. 




Figure 6: Boxplots of the performance measure distributions computed for 
each model over the 10 vahdation stations. 

A3-2 - with the complex nonseparable covariance function - for reasons of 
computational costs that are not offset by a significant improvement of the 
prediction capability. Among the remaining models, Model B is the less 
appealing one since it has less stars than Model C and is slower than Model 
Al in the estimation routine. Finally, the choice is between Model Al - simple 
in structure and covariance function - and Model C, which differs in the 
hierarchy for an additional equation explicating the temporal dynamics. As 
observed. Model C has a slightly better prediction capability, which is offset 
by a larger computational cost. In the absence of suitable computational 
resources. Model Al has a good performance at a reasonable cost. 
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6 Conclusions 



In this work we propose a comparison of six models for air quality data, 
taking into account different levels of complexity either in the hierarchical 
structure or in the spatio-temporal covariance function. This makes it pos- 
sible to analyze a wide range of models suitable for complex environmental 
phenomena characterized by both a large and small scale of variation, as well 
as by spatial relationships, temporal dynamics and spatio-temporal interac- 
tions. In this work we focus on PM concentration, comparing models on the 
basis of certain criteria that take into account intrinsic complexity, computa- 
tional costs and spatial prediction capability. These criteria are particularly 
important because environmental agencies need to evaluate air quality sta- 
tus, predicting pollutant concentration at unmonitored sites, at a reasonable 
computational cost. The table that summarizes the proposed set of criteria 
- Table 6 in our case study - has the merit of being easy to communicate to 
non-statisticians, who may be environmental agency practitioners or policy 
makers, responsible for drawing up air quality legislations. Moreover, these 
spatial prediction maps can be used in risk assessment analysis and ecolog- 
ical risk models, as continuous exposure levels, or as validation criteria in a 
network design study. 

The compared models share a large-scale trend component whose esti- 
mated coefficient are robust as the model changes. Furthermore, this prop- 
erty allows models to be compared only through the residual detrended pro- 
cess. First, this component is modelled by directly specifying certain spatio- 
temporal correlation functions with increasing complexity, from the i.i.d. over 
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time case to the nonseparable form (Model Al, A2, A3-1 and A3-2). Then, in 
Model B the detrended process is defined as the sum of a constant in space 
process, with an AR(1) temporal dynamics, and a purely spatial process. 
Model C, on the other hand, features a spatial process which evolves in time 
according to an AR(1) equation with innovations i.i.d. over time. 

Our case study, concerning the prediction of PMio in Piemonte, allows us 
to identify the best combination of hierarchical structure and spatio-temporal 
covariance function based on the evaluation of the proposed criteria. The ap- 
plication highlights that complex spatio-temporal covariances (in Model A3-1 
and A3-2) require computational costs which are too high with respect to the 
observed prediction capability. The separable Model A2 is discarded because 
of its poor performance in the predicting procedure (one star). Model B, on 
the other hand, is comparable to Model Al with respect to the prediction 
capability (two stars) but it is slower from a computational point of view. 
Model C is the only three-star predictor: this suggests that, in our case 
study, a model with a complex hierarchical structure is globally preferable 
to one with a complex spatio-temporal covariance function. Thus, the last 
comparison is between Model Al and C: since the latter requires small addi- 
tional computational cost in order to be awarded an additional star, our final 
suggestion is to choose according to the available computational resources. 

Obviously, the conclusions and the final suggestion in our case study do 
not constitute the general answer to the question about the best combination 
of hierarchical structure and spatio-temporal covariance function. For exam- 
ple. Model A3-1 and A3-2 could have a better prediction performance in a 
case study with a larger spatial domain and a longer time period, as well as 
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a thicker monitoring network. Moreover, their computational costs could be 
reduced by simplifying the structure of the dense covariance matrices, using 
for example covariance tapering techniques (Furrer et ai, 2006). Nonethe- 
less, the proposed set of criteria summarized in a simple table - jointly with 
the model discussion - makes the strategy applied in our PMio application 
portable to other environmental studies, where decision makers choose which 
model to implement. 

The choice of the model is also related to the size of the dataset used to 
estimate the parameters as well as the grid resolution for spatial mapping, 
since we adopt computing intensive statistical methods. The increasing avail- 
ability of large spatio-temporal datasets in many environmental fields gives 
rise to the so-called "big n" problem and to the infeasibility of matrix op- 
erations whose complexity increases in cubic order. To tackle this issue, we 
need to adopt efficient computational strategies. A solution may be given by 
the implementation of parallel linear algebra algorithms based on graphical 
processors (GPU), which is part of our ongoing research. 

Acknowledgements 

The authors would like to thank Veronica J. Berrocal, Alan E. Gelfand, 
Giovanna Jona Lasinio for the useful discussions and comments. 



31 



References 



Al-Awadhi, F. and Al-Awadhi, S. (2006). Spatial-temporal model for ambient 
air pollutants in the state of Kuwait. Environmetrics , 17, 739-752. 

Bande, S., Clemente, M., De Maria, R., Muraro, M., Picollo, M., Arduino, 
G., Calori, G., Finardi, S., Radice, P., Silibello, C., and Brusasca, G. 
(2007). The modelling system supporting Piemonte region yearly air qual- 
ity assessment. Proceedings of 6th International Conference on Urban Air 
Quality, Limassol, Cyprus, 27-29 March 2007 . 

Banerjee, S., Carlin, B., and Gelfand, A. (2004). Hierarchical Modeling and 
Analysis for Spatial Data. Monographs on Statistics and Applied Proba- 
bihty. Chapman and Hall, New York. 

Clark, J. (2005). Why environmental scientists are becoming Bayesians. 
Ecology Letters, 8, 2-14. 

Cocchi, D., Greco, F., and Trivisano, C. (2007). Hierarchical space-time 
modelling of PMio pollution. Atmospheric environment, 41, 532-542. 

Cressie, N. (1993). Statistics for Spatial Data. Wiley, New York. 

Cressie, N. and Huang, H. (1999). Classes of nonseparable, spatio-temporal 

stationary covariance functions. Journal of the American Statistical Asso- 
ciation, 94, 1330-1340. 

Fasso, A. and Cameletti, M. (2010). A unified statistical approach for simula- 
tion, modelling, analysis and mapping of environmental data. Simulation, 
86(3), 139-154. 

32 



Fasso, A., Cameletti, M., and Nicolis, O. (2007). Air quality monitoring 
using heterogeneous networks. Environmetrics , 18(3), 245-264. 

Finardi, S., De Maria, R., D'Allura, A., Cascone, C, Calori, G., and LoUob- 
rigida, F. (2008). A deterministic air quality forecasting system for Torino 
urban area, italy. Environmental Modelling and Software, 23(3), 344-355. 

Fuentes, M. (2006). Testing for separability of spatial-temporal covariance 
functions. Journal of Statistical Planning and Inference, 136, 447-466. 

Furrer, R., Genton, M., and Nychka, D. (2006). Govariance tapering for inter- 
polation of large spatial datasets. Journal of Computational and Graphical 
Statistics, 15(3), 502-523. 

Gneiting, T. (2002). Nonseparable, stationary covariance functions for space- 
time data. Journal of the American Statistical Association, 97, 590-600. 

Harrison, R., Stedman, J., and Derwent, D. (2008). New directions: Why are 
PMio concentrations in Europe not falling? Atmospheric Environment, 
42, 603-606. 

Huang, H., Martinez, F., Mateu, J., and Montes, F. (2007). Model com- 
parison and selection for stationary space- time models. Computational 
Statistics and Data Analysis, 51, 4577-4596. 

Kolovos, A., Yu, H., and Ghristakos, G. (2006). SEKS-GUI v.0.6 User Man- 
ual. Department of Geography, San Diego State University: San Diego, 
GA. 



33 



Pang, W., Christakos, G., and Wang, J. (2010). Comparative spatiotemporal 
analysis of fine particulate matter pollution. Environmetrics , 21, 305-317. 

Poll, A. and Cirillo, M. (1993). On the use of the normalized mean square 
error in evaluating dispersion model performance. Atmospheric Environ- 
ment, 27, 2427-2434. 

PoUice, A. and Jona Lasinio, G. (2010a). A multivariate approach to the 
analysis of air quality in a high environmental risk area. Environmetrics. 
DOI: 10.1002/env.l059. 

Pollice, A. and Jona Lasinio, G. (2010b). Spatiotemporal analysis of the 
PMio concentration over the Taranto area. Environmental Monitoring 
and Assessment, 162, 177-190. 

Sahu, S. and Mardia, K. (2005). A Bayesian Kriged-Kalman model for short- 
term forecasting of air pollution level. Journal of the Royal Statistical 
Society, Series C, 54, 223-244. 

Sahu, S. and Nicolis, O. (2007). An evaluation of European air pollution reg- 
ulations for particulate matter monitored from a heterogeneous network. 
Environmetrics, 20, 943-961. 

Sahu, S., Gelfand, A., and Holland, D. (2006). Spatio-temporal modeling of 

fine particulate matter. Journal of Agricultural, Biological and Environ- 
mental Statistics, 11, 61-86. 

Shaddick, G. and Wakefield, J. (2002). Modelling daily multivariate pollutant 
data at multiple sites. Journal of Applied Statistics, 3, 351-372. 

34 



Smith, R. and Kolenikov, S. (2003). Spatiotemporal modeling of PM2.5 data 
with missing values. Journal of Geophysical Resarch, 108(D24), 11-1,11- 
11. 

Tanner, M. A. (1996). Tools for Statistical Inference: Methods for the Explo- 
ration of Posterior Distributions and Likelihood Functions. Springer, New 
York. 

West, M. and Harrison, J. (1997). Bayesian Forecasting and Dynamic Mod- 
els. Springer. 

Wikle, C. K. (2003). Hierarchical models in environmental science. Interna- 
tional Statistical Review, 71, 181-199. 

Wikle, C. K., Berliner, L., and Cressie, N. (1998). Hierarchical bayesian 
space-time models. Journal of Environmental and Ecological Statistics, 5, 
117-154. 

Yu, S., Eder, B., Dennis, R., Chu, S., and Schwartz, S. (2006). New unbi- 
ased symmetric metrics for evaluation of air quality models. Atmospheric 
Science Letters, 7, 26-34. 

Zidek, J., Sun, L., Le, N., and Ozkaynak, H. (2002). Contending with space- 
time interaction in the spatial prediction of pollution: Vancouver's hourly 
ambient PMio field. Environmetrics , 13, 595-613. 



35 



A Details on covariance separability 

For Model B it holds that 



Cov {u {si, t) , u {sj,t')) = E {u (sj, t) u {sj, t')) — E {u {si, t)) E {u {sj,t')) 

= E iy{t)y{t')) - E (yit)) E (y(t')) + E ito{sMsj)) 
= Cov {y (t) , y {t')) + Cov {u{si),u{sj)) 

2 

I — p'^ 

where h = \\si — Sj\\ and / = \t — t'\. This means that the spatio-temporal 
covariance function is given by the sum of a purely temporal and a purely 
spatial covariance function. In particular, the temporal term is defined by 
the AR(1) structure of y (see (9)) with \p\ < 1. 

Turning to Model C we have that Cov {u {si,t) ,u {sj,t')) is equal to 
Cov {y {si,t) ,y {sj,t')). Moreover, recalling that any AR(1) process can be 
rewritten using an infinite-order moving average representation, it follows 
that 

(oo oo \ 

^p''u{si,t - k),^p'''u{sj,t' - k') j 
fe=0 k'=0 J 

oo 

= Yl {p^^^'Cov{u{s,,t-k),u{sj,t' -k'))^ 

k,k'=0 

1 — 

Hence the spatio-temporal covariance function can be rewritten in a separable 
multiplicative form where is a valid temporal correlation function if 
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\p\ < 1- 

B Derivation of posterior distributions 

Let us denote with \1/ the parameter vector to be estimated. Generally speak- 
ing the joint posterior distribution is given by 

/ (VI/, f/ I Z) oc / (Z I [/, VI/) f{U I vl/)/(vl>) (12) 

where the notation /(.) is used for the probability density function and Z 
and U denote, respectively, the collections of data z{si,t) and of the latent 
processes u{si,t). Note that independent prior distributions are chosen for 
the parameters, so that /(vE') = Hitli''*'' fi^i)- Conditionally on U the 
observations z{si,t) are serially independent and Eq.(12) can be rewritten 
as follows: 

T 

/ (VI/, [/ I Z) cx n / I U, ^) fiU I vI/)/(vI/) (13) 
t=i 

where Zf = {z{si,t), . . . , z{sd,t)y and / {Zt \ U, v&) can be derived through 
the measurement equation (1). The joint posterior distribution (13) is com- 
pletely specified when f{U \ v[f) is defined. Since this conditional distribution 
is specific for each model introduced in Section 2, we describe each case sepa- 
rately in the following subsections specifying in the subscript of v[f the model 
to which it refers. 

Note that, as described in Banerjee et al. (2004), in the case of hierar- 
chical models it is preferable, where possible, to marginalize the posterior 
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distribution over U, thus obtaining the following posterior distribution 

T 

f{^\Z)<xl[f{Z,\^)f{^). (14) 
t=i 

In the sequel, this marginal approach is used for all models except Model A2, 
where the marginalization prevents the use of the properties of the Kronecker 
product described in Section B.2. 

In order to implement the Gibbs sampling algorithm we derive the full 
conditional distributions and, when these are not available in an exact closed- 
form, we introduce a Metropolis-Hastings (MH) sampling step (which is the 
so-called Metropolis-within-Gibbs algorithm). 

B.l Model Al 

As specified in Section 2.1 the parameter vector for Model Al is given by 
= (/3, o'e, 6', erf,). Moreover, the following Gaussian (i-dimensional condi- 
tional distribution holds 

/(Zi|vI/^J~Ar,(X,/3,S^+,) 

where Xt = (X(si,t)', . . . ,X(srf,t)')' is the {d x k) covariate matrix and the 
variance-covariance matrix is defined as 

E^+, = a'^Ce (lis, - s,\\)^^^^^_^ + a%, (15) 



38 



with Cg (II Si - Sj\ 



)ij=i d ~ Cg{h) with elements evaluated through the spa- 
tial correlation function shown in (2.5). Then, it follows that the posterior 
distribution is given by 



1=1 



X IS, 



T 
2 



<+e\ ' exp 



1 ^ 

- 5^ (Z, - X,/3)' {Zt - X,(3) 



t=i 



Taking a Normal prior distribution for /3, that is /3 ~ Nk{0, So), straight- 
forward calculation yields a Gaussian full conditional distribution for /3 with 
mean AB' and covariance A where A = ^Sq ^ + ^^^^ X^'S^^^X^ j and 
B = YlJ^i (Zj'S^^gXt) . In order to estimate the remaining parameters, the 
MH algorithm is used. 

B.2 Model A2 

Let us denote with Ut = {U{si,t), . . . ,U (s^, t))' the latent process at time t. 
Moreover, let U = {Ui, . . . , Ut} be the {dTx 1) random effects vector blocked 
by sites'^ and Z = {Zi, . . . , Zt} the [dT x 1) data vector. The distribution 
of U is Gaussian with mean vector given by X/3 where X = {Xi, . . . , Xrp} is 
the {dT X k) array of covariates and /3 is the corresponding coefficient vector. 
The variance-covariance matrix of U is defined as 

S[/ = alCe {\\s, - Sj\\,\t - t'\) i,j=i,...,d = a^C^ (h, 1) 

t,t'=l,...,T 

^Here, and in the sequel, braces are used for column stacking of the vectors involved. 
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where Ce — \t — t'\) i,j=i,...,d = Ce (h, 1) with elements evaluated through 

t,!t'=l'...',T 

the separable spatio-temporal correlation function given in Eq.(6). 

Thanks to the separability property, the correlation matrix Cq (h, 1) can 
be written as Cq (h, 1) = Ce^ (1) ® Ce^ (h), where Cg^ (1) and Cg^ (h) denote 
respectively the temporal and spatial correlation matrix. Thus, the param- 
eter vector for Model A2 is '^a2 = o"^, a^, ^^i, 6'2) and the joint posterior 
distribution is given by the following equation: 



Td 



f{^A2,U\Z) oc ^ exp 



2al 



^ {Z-U)'{Z-U) 



X 



{aiy- |C,(h,l)r^exp 



2a2 



^ {U-X/3yCeih,\)-\U-X/3) 



dim(<i'A2) 
i=l 

where, thanks to the Kronecker product properties, |C6i(h, 1)| = |Ce^(l)|'^ |C£)2(h)|"^ 
and Ce(h,l)-i = ® Ce,{\i)-\ 

Taking a Normal prior distribution for /3, that is /3 ~ A^a;(0, Sq), from 
Eq.(16) it follows that the full conditional distribution for /3 is Gaussian 
with mean AB' and variance A where A = ^Sg ^ + -^X'Cgih, l)^^xj and 
B = -\U'Cg{h,\)~^X . Moreover, for the variance parameter we have the 
following conditional distributions: 



alr^IG{a + ^,b+^iU- XpyCgih, \)-\U - X/3) 



[a+'^,b+^-{Z-UnZ-U)^ 



where a and b denote the hyperparameters of the corresponding prior Inverse- 
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Gamma distributions IG{a, b). For the estimation of 6i and 62 we adopt the 
MH algorithm. Finally, let us recall that the spatio-temporal process U is 
sampled using the en bloc procedure described in Sahu et al. (2006). 



B.3 Model A3 

Using the notation introduced in the previous section for Model A2, the 
posterior distribution for Model A3 is 



/(^^3 I Z) oc |S^+,|~2exp 



dim{'i/A'i) 



n /(^^3.) 

i=l 

(16) 

where ^^^3-1 = (/3, (x^, al, Oaz-i) for Model A3-1 and ^^3-2 = (/3, fx^, al, ^^3-2) 
for Model A3-2. The variance-covariance matrix in (16) is given by 

T.^+e = (ylCe{h,\) + alhT 

where the elements of Ce (h, 1) are evaluated through the nonseparable cor- 
relation functions Cg{h,l) given in Eq. (2.7) of (7). 

From Eq.(16) we can easily obtain that the full conditional distribution for 
(3 is Gaussian with mean AB' and variance A where A = (Sq ^ + X'S^j^X) 
and B = Z'S^j.X, assuming a Normal prior distribution for j3, that is 
P ~ A'"fc(0, So). All the remaining parameters are estimated through the 
Metropolis-Hastings algorithm. 
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B.4 Model B 

Given the hierarchical structure of Model B (see Equations (8) and (9)), 
Eq.(14) can be rewritten as 

T 

f{^B\Z) = l[f [Zt I F, ^b) f {Y I ^b) f (^b) (17) 
t=i 

where ^b = cr"^, cr^, o"^, d, p) and Y = {Yi, . . . , Yt}, with Yt denoting the 
constant in space unidimensional latent process at time t, with t = 1, . . . ,T. 
The Markovian structure defined in Eq.(9) leads us to rewrite the term 
/ (Y I ^b) in (17) as follows 

T 

f{Y\^B) = f {Yo \^B)l[f {Yt I Ft-i, ^b) ■ (18) 

Moreover, the conditional probability distributions that occur in the previous 
equations are the following: 

fiZt\Y,^B) ~ N,{XtP + KYt,E^+,), 
f{Yt\Yt_,,^B) ~ N^{pY^_^,a^^), 
/(FoI^b) ~ iVi(0,4), 

where S^^+g is defined by Eq.(15), as for Model Al, and is a (c? x 1)- 
dimensional ones vector. Taking the logarithm of Eq. (17), the posterior 
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distribution for Model B is given by 



f{^B\Z) oc |S 



(jj+e I 



exp 



1 ^ 



t=i 



X 



(4)-^exp(--Fo^ 



X (aj) 2 exp 



2cr2 

t=i 



n /(*^ 

1=1 



Straightforward calculation yields a Gaussian full conditional distribution 
for (3 with mean AB' and covariance A where A = (^q^ + Xl^i ^t^oj+e^t j 
and i? = Ylt=i {Z't^Z+eXt — {KYtyT,^]_^Xt^ , assuming a Normal prior distri- 
bution for P, that is /3 ~ A'"fc(0, So). For the estimation of we consider an 
Inverse Gamma prior IG{a, b) and sample from the following full conditional 
distribution 

~ + a, 6 + ^ (F, - pFt-i)' j . 

For the other parameters, a'^,a'^,6 and p, we adopt the MH algorithm, 
while the latent process Y is sampled using the Forward Filtering Backward 
Sampling (FFBS) as described in West and Harrison (1997). 
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B.5 Model C 

With regard to Model C, Eq.(17) and Eq.(18) still hold substituting with 
and using the following conditional distributions: 



f{Yo\^c) ~ iVrf(0,So), 



where Yt = . . . ,Y{sd,t))' and P = pid is the AR{1) transition ma- 



trix, Se = a^Id and T,^ 



Si - s 



J \\H,j=l,...,d 



(T'^Ce{h), with elements 



evaluated through the spatial correlation function appearing in (5). It follows 
that the posterior distribution is 



f{^c\Z) (X (a^)-^ exp 

_ 1 

X |So| 2 exp 



1 ^ 

5^ (Zi - Xt/3 - Yt)' {Zt - Xtl3 - Yt) 
^ t=i 



1, 



X {aiy- |C,(h)r^exp 



(r, - PF,_0' C.(h)-i (r, - PYt^r) 

^ t=l 



X 



n f(^c.)- 



i=l 



Straightforward calculation yields a Gaussian full conditional distribution 



for /3 with mean AB' and covariance A where A = ^Sq ^ + ^ Ylt=i -^'t-^t 
and B = Ylt=i i^'t-^t — ^t-^t), assuming a Normal prior distribution for 
P, that is /3 ~ A'"fc(0, Sq). Moreover, for the variance parameters we have the 
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following conditional posterior distributions: 



„2 



< ~ IG 



where a and h denote the hyperparameters of the corresponding Inverse- 
Gamma priors IG{a,b). The parameters 6 and p are estimated using the 
MH algorithm while latent process Y is sampled using the FFBS algorithm. 



C Spatial prediction 

Spatial prediction at a new location sq and time to (with 1 < to ^ ^) is 
based on the posterior predictive distribution of z{so,to) which is given by 

/ (z(so, to) I ^) = j f (ziso, to), m(so, to), ^ I Z) duiso, to)d^. (19) 

Note that for all the models described in Section 2 except Model A2, we 
can marginalize over U and (19) reduces to the following equation 

/ {z{so, to)\Z) = J f {z{so, to), VI/ I Z) rfvl/ = J f (^(so, to) I Z, VI/) / (VI/ I Z) d^H. 

(20) 

In practice, in order to obtain a prediction using MCMC methods, the 
posterior predictive distribution (20) is sampled by composition (Tanner, 
1996). This means that we first draw from the posterior f {"^ \ Z) which then 
allows us to draw from / {z{so, to) | Z,"^). Therefore, to define / {z{so, to) | Z, v[') 
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we refer to the standard multivariate Normal theory. In fact, from the joint 
distribution of the data Z and ^(scto), that is 



Z 



\ 



N, 



dT+l 



/il 



En E 



12 



E'^2 ^22 



it derives that 



z {so, to) \Z,^^N, (/i2 + E'l^En' {Z - ^i^) , E22 - E;2En'Ei2) • (21) 

Note that En, the variance-covariance matrix of the data Z, has dimension 
{dT X dT) while E22 is a scalar. Moreover, the covariance vector E12 has 
dimension {dT x 1) and its generic element is Cov {z {si,t) , z {so,tQ)) {i = 
1, . . . ,d]t = 1, . . . ,T). In the following subsections we report schematically 
for each model the elements ^i, fi2, En, E22 and E12 of (21) referring to 
the notation introduced in Section B. Moreover, let h denote the vector of 
distances — so|| with i = 1, . . . ,d, and 1 the vector of temporal lags — toll 
with t = 1, . . . ,T. Thus, for example, the term Cq (h^ is the correlation 
vector whose generic element is (||sj — so||), with i = 1, . . . ,d. 

C.l Model Al 

For Model Al it holds that /ii = X(3, where X = {Xi, Xt} is the {dTxk) 
array of covariates, and En is a block diagonal matrix with blocks given by 
E^_i_£ and defined by (15). Moreover, fi2 = x{so, to)/3, where x{so, to) is the k- 
dimensional covariate vector observed at time to at site Sq, and E22 = c^S + '^e- 
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Finally, let us denote with Odxi a (i-dimensional row vector of zeros and with 
Ce ( h) the correlation vector. Hence the covariance vector S12 is defined by 



^12 



Odxi, • • • , c^C'e ( h ) , . . . , Orfxi 



t=i 



t=to 



t=T 



where it can be seen that the d non-zero elements occur at the time point to. 
C.2 Model A2 

For Model A2 it is not possible to marginalize over U and thus for defining 
the posterior predictive distribution we refer to (19) and rewrite it as follows: 

/ {z{so, to)\Z) = J f {z{so, to) I u{so, to), ^) f {u{so, to) I U, ^) f{U, ^ I Z)du{so, to)dUd^ 

where / {z{so, to) \ u{so, to), ~ A^i (m(so, to), al). To define / {u{so, to) \ U, 
we consider the following joint distribution 



v 



u 

u{so,to) 



N, 



dT+l 



LV 



Xl3 
x{so,to)/3 



( 



12 



y 



12 



cr. 



with the covariance vector S12 given by 



S12 = ot [Ce, (It - to|)i=, ® Ce, {\\s, - so\\),^,A = [CeAl] ® CeAh 
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Now it can be easily derived that 

( ,^\TrM, AT ( ( , ,._^n^MMH^^^M 2 s;,c,(h,i)-^Si 2 

u (so, to) I f/, ^ ~ iVi x{sq, to)l3 H , 

(22) 

Using composition sampling, once a value from (22) is drawn, it is possible 
to obtain a prediction generating a value from / (z(so,to) I w(so,to), ^)- 

C.3 Model A3 

For Model A3 we have that /ii = X/3, where X = {Xi, . . . , Xt} is the (dTxk) 
array of covariates, and Sn = "Euj+e is the variance-covariance matrix defined 
in Section B.3. Moreover, /i2 = x(so,to)/3 and S22 = cr^ + c^e- Finally, let 
us denote with Ce (h, \t — to\j = Ce {\\si — so\\, \t — to|)j=i ^ the {d x 1)- 
dimensional correlation vector at time t evaluated through the nonseparable 
correlation function Ce{h,l) given in Eq.(7). It follows that the covariance 
vector E12 is given by 

S12 = al (^Ce (h, |1 - tol) , . . . , (h, |t - to|) , . . . , (h, |T - to|)) • 
C.4 Model B 

For Model B Eq.(19) needs to be rewritten as follows in order to consider Y: 



f {z{so, to)\Z) = J f {z{so, to) I y{so, to), ^) / {y{so, to) I Y, ^) f{Y, ^ \ Z)dy{so, to)d^. 

(23) 

As described in Section B.4, the process Y for a given time point is con- 
stant over space so that y(so,to) = l/(to) for any so, and y{to) is esti- 
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mated using the FFBS algorithm. To draw a value from the distribution 
/ {z{so, to) I y{so, to), we refer to Eq.(21) with fii = Xf3 + Y (where X = 
{Xi, . . . , Xt} and r = {Fi, . . . ,Yt}), fi2 = x{so, to)/? + y{to), ^22 = + (^s 
and Sii is a block diagonal matrix with blocks given by Sa;+e and defined by 
(15). Finally, the covariance vector S12 is given by 



^12 




t=to 




as in the case of Model Al of Section C.l. 



C.5 Model C 

With reference to Model C, Eq.(23) still holds but the term f{y{so, to) I Y, *) 
has to be derived appropriately. To this purpose, consider the following joint 
distribution: 



/ 

V 



Y 

y{sQ,to) 



N, 



dT+l 



J 



/ 



pY 

pyiso,to 



^11 ^12 



v 



y 



12 



where Y = {Yq, . . . ,1^-1}, the variance-covariance matrix of F is a block 
diagonal matrix with blocks given by T,^ = cr'^Cg (^hj , and the covariance 
vector II12 is 

Orfxi,- • • (h ),..., Ofl 



n2 



t=l 



t=to 



t=T 
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as in the case of Model Al and of Model B. Thus, it can be easily derived 
that 

y (so, to) I r, ^ ~ iVi (py(so, to - 1) + S'l^Sri^ {y - pr) , al - S'l^Sri'Sis) . 

(24) 

Using composition sampling for the posterior predictive distribution (23), 
once a value from (24) is drawn, we can obtain a prediction generating a value 
from / (z(so,to) | ?/(so, to), ^). 
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'; -r 


Model A3-1 
Model A3-2 


ip(x) = (ax" + b)/{b{ax'^ + 1)) 
ip{x) — {ax°' + ly 


V?(.t) 
ip{x) 


= CXp( — C,T^) 

= (1 + cx^)-" 



Table 1: Function and ip{x) that define the nonseparable spatio- 

temporal correlation function of Model A3-1 and A3-2. 





Model Al 


Model A2 


Model A3-1 


Intercept 


3.881 


(3.815 


3.946) 


3.938 


(3.868 


4.003) 


3.935 


(3.848 


4.021) 


A 


-0.155 


(-0.169 


-0.140) 


-0.142 


(-0.220 


-0.062) 


-0.162 


(-0.195 


-0.128) 


UTMX 


-0.102 


(-0.124 


-0.080) 


-0.110 


(-0.197 


-0.027) 


-0.112 


(-0.155 


-0.069) 


UTMY 


-0.079 


(-0.097 


-0.061) 


-0.072 


(-0.121 


-0.015) 


-0.076 


(-0.108 


-0.044) 


WS 


-0.091 


(-0.109 


-0.073) 


-0.078 


(-0.107 


-0.054) 


-0.086 


(-0.101 


-0.072) 


HMIX 


-0.041 


(-0.047 


-0.035) 


-0.075 


(-0.101 


-0.046) 


-0.050 


(-0.056 


-0.044) 


TEMP 


-0.262 


(-0.309 


-0.217) 


-0.103 


(-0.166 


-0.051) 


-0.138 


(-0.171 


-0.103) 


PREC 


-0.085 


(-0.107 


-0.062) 


-0.105 


(-0.127 


-0.082) 


-0.090 


(-0.105 


-0.076) 


EMI 


0.063 


(0.053 


0.073) 


0.083 


(0.038 


0.133) 


0.054 


(0.031 


0.076) 




Model A3-2 


Model B 


Model C 


Intercept 


3.928 


(3.853 


4.001) 


3.955 


(3.773 


4.119) 


3.956 


(3.672; 4.197) 


A 


-0.157 


(-0.187 


-0.127) 


-0.172 


(-0.186 


-0.157) 


-0.152 


(-0.187; -0.118) 


UTMX 


-0.108 


(-0.145 


-0.071) 


-0.113 


(-0.130 


-0.095) 


-0.092 


(-0.173; 0.001) 


UTMY 


-0.074 


(-0.102 


-0.047) 


-0.073 


(-0.086 


-0.060) 


-0.129 


(-0.196 


-0.059) 


WS 


-0.090 


(-0.105 


-0.076) 


-0.085 


(-0.100 


-0.071) 


-0.032 


(-0.052 


-0.012) 


HMIX 


-0.050 


(-0.056 


-0.044) 


-0.042 


(-0.048 


-0.036) 


-0.024 


(-0.041 


-0.007) 


TEMP 


-0.128 


(-0.165 


-0.091) 


-0.308 


(-0.349 


-0.267) 


-0.218 


(-0.296 


-0.119) 


PREC 


-0.093 


(-0.107 


-0.078) 


-0.090 


(-0.106 


-0.074) 


-0.040 


(-0.060 


-0.019) 


EMI 


0.058 


(0.037 


0.079) 


0.061 


(0.050 


0.072) 


0.049 


(0.030 


0.067) 



Table 2: Posterior estimates (mean and 95% credible interval in brackets) of 
the covariate coefficient vector /3. 
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Model Al 


Model A2 


Model A3-1 




0.237 (0.198; 0.283) 
0.071 (0.066; 0.077) 


0.247 (0.229; 0.267) 
0.015 (0.013; 0.018) 


0.077 (0.070; 0.083) 
0.034 (0.028; 0.039) 




Model A3-2 


Model B 


Model C 


ol 


0.077 (0.071; 0.084) 
0.034 (0.029; 0.040) 


0.063 (0.053; 0.073) 
0.051 (0.042; 0.060) 
0.065 (0.052; 0.081) 


0.950 (0.822; 1.091) 
0.013 (0.011; 0.015) 



Table 3: Posterior estimates (mean and 95% credible interval in brackets) of 
the variance parameters. 





Model Al 


Model A2 




0.0033 (0.0025; 0.0042) 










0.492 


(0.473; 0.511) 






0.032 


(0.029; 0.035) 




Model B 


Model C 


9 


0.050 (0.040; 0.061) 


0.0022 


(0.0019; 0.0024) 


P 


0.831 (0.738: 0.919) 


0.65 1 


(0.629: 0.677) 



Table 4: Posterior estimates (mean and 95% credible interval in brackets) of 
the correlation function parameters for Model Al, A2, B and C. 





Model A3-1 


Model A3-2 


a 


0.058 


(0.045; 0.074) 


4.088 


(3.533; 4.708) 


a 


0.736 


(0.650; 0.831) 


0.800 


(0.743; 0.859) 


b 


0.047 


(0.036; 0.058) 






c 


0.549 


(0.448; 0.659) 


0.982 


(0.823; 1.158) 


7 


0.110 


(0.089; 0.132) 


0.184 


(0.156; 0.212) 


u 






0.801 


(0.743; 0.864) 


T 






0.4756 


(0.434; 0.517) 



Table 5: Posterior estimates (mean and 95% credible interval in brackets) of 
the correlation parameters for Model A3-1 and A3-2. 
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